A Textual Modus Operandi: Surrey's Simple System for Author Identification Notebook for PAN at CLEF 2013

نویسندگان

  • Anna Vartapetiance
  • Lee Gillam
چکیده

Detecting deceptions of various kinds may be variously possible, but has little value if the deceiver cannot be identified. In this paper, we discuss our approach to Authorship Attribution that uses vector similarity with a frequencymean-variance framework for patterns of stopwords (no more than ten). The high frequency individual occurrences, and patterns of co-occurrence, can be used as identifier of an author’s style, and operates similarly across certain languages without prior linguistic knowledge. This simple system achieved F1 values of 0.66, 0.74 and 0.78 for Early Bird, Final, and Post submission assessment of the Train Corpus. We cannot yet offer further explanation as the Test Corpus is not available at the time of writing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Readability for Author Profiling? Notebook for PAN at CLEF 2013

This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.

متن کامل

Semantic-based Features for Author Profiling Identification: First insights Notebook for PAN at CLEF 2013

In this article we present a semantic-based approach concerning the identification of particular author’s traits, such as age and gender, from social media texts. The model here described is intended to provide information on different levels of analysis: from textual markers to semantics. Different classifiers were used to assess the performance and scope of the model.

متن کامل

Vector Space Model and Overlap Metric for Author Identification Notebook for PAN at CLEF 2013

This paper describes our entry for the Author Identification task at PAN 2013. The Author Identification task was performed using a combination of Vector Space Model [1] (VSM) and Similarity Overlap Metric [3] (SOM) on the character n-grams extracted from the documents related to an author and the document of question. A combination of the VSM and SOM provided an overall F-measure, precision an...

متن کامل

Style-based Distance Features for Author Verification Notebook for PAN at CLEF 2013

In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.

متن کامل

A Graph Based Authorship Identification Approach: Notebook for PAN at CLEF 2015

The paper describes our approach for the Authorship Identification task at the PAN CLEF 2015. We extract textual patterns based on features obtained from shortest path walks over Integrated Syntactic Graphs (ISG). Then we calculate a similarity between the unknown document and the known document with these patterns. The approach uses a predefined threshold in order to decide if the unknown docu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013